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Abstract 

When evaluating causal influence from one time series to another in a multi- 
variate dataset it is necessary to take into account the conditioning effect of the 
other variables. In the presence of many variables, and possibly of a reduced 
number of samples, full conditioning can lead to computational and numerical 
problems. In this paper we address the problem of partial conditioning to a lim- 
ited subset of variables, in the framework of information theory. The proposed 
approach is tested on simulated datasets and on an example of intracranial EEG 
recording from an epileptic subject. We show that, in many instances, condi- 
tioning on a small number of variables, chosen as the most informative ones for 
the driver node, leads to results very close to those obtained with a fully multi- 
variate analysis, and even better in the presence of a small number of samples. 
This is particularly relevant when the pattern of causalities is sparse. 

Introduction 



Determining how the brain is connected is a crucial point in neuroscience. To 
gain better understanding of which neurophysiological processes are linked to 
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which brain mechanisms, structural connectivity in the brain can be comple- 
mented by the investigation of statistical dependencies between distant brain 
regions (functional connectivity), or of models aimed to elucidate drive- response 
relationships (effective connectivity). Advances in imaging techniques guaran- 
tee an immediate improvement in our knowledge of structural connectivity. A 
constant computational and modelling effort has to be done in order to op- 
timize and adapt functional and effective connectivity to the qualitative and 
quantitative changes in data and physiological applications. The paths of infor- 
mation flow throughout the brain can shed light on its functionality in health 
and pathology. Every time that we record brain activity we can imagine that we 
are monitoring the activity at the nodes of a network. This activity is dynamical 
and sometimes chaotic. Dynamical networks [2] model physical and biological 
behaviour in many applications; also, synchronization in dynamical network is 
influenced by the topology of the network itself |6j. A great need exists for 
the development of effective methods of inferring network structure from time 
series data; a dynamic version of Bayesian Networks has been proposed in |13j . 
A method for detecting the topology of dynamical networks, based on chaotic 
synchronization, has been proposed in |26) . 

Granger causality has become the method of choice to determine whether 
and how two time series exert causal influences on each other [TS], [7]. This 
approach is based on prediction: if the prediction error of the first time series is 
reduced by including measurements from the second one in the linear regression 
model, then the second time series is said to have a causal influence on the 
first one. This frame has been used in many fields of science, including neural 
systems [16] , [5] , [24] , [22] , reo-chaos [11] and cardiovascular variability [T0| . 

From the beginning [14) , [25] , it has been known that if two signals are influ- 
enced by third one that is not included in the regressions, this leads to spurious 
causalities, so an extension to the multivariate case is in order. The conditional 
Granger causality analysis (CGCA) [l^ is based on a straightforward expansion 
of the autoregressive model to a general multivariate case including all measured 
variables. CGCA has been proposed to correctly estimate coupling in multivari- 
ate data sets [1] , [5] , [5] , [Hi • Sometimes though, a fully multivariate approach 
can entrain problems which can be purely computational but even conceptual: 
in presence of redundant variables the application of the standard analysis leads 
to under-estimation of causalities [1]. 

Several approaches have been proposed in order to reduce dimensionality 
in multivariate sets, relying on generalized variance [4], principal components 
analysis or Granger causality itself [18j . 

In this paper we will address the problem of partial conditioning to a limited 
subset of variables, in the framework of information theory. Intuitively, one may 
expect that conditioning on a small number of variables should be sufficient to 
remove indirect interactions if the connectivity pattern is sparse. We will show 
that this subgroup of variables might be chosen as the most informative for the 
driver variable, and describe the application to simulated examples and a real 
data set. 
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Materials and Methods 



We start by describing the connection between Granger causality and information- 
theoretic approaches hke the transfer entropy in (23) . Let {^n}n=i,.,N+m be a 
time series that may be approximated by a stationary Markov process of or- 
der m, i.e. p{^n\^n-l, ■ ■ ■ An-m) = p{Cn\^7i^l, ■ ■ ■ , (.n-m-l) ■ We will usc the 

shorthand notation Xi = (^i, . . . , and Xi — £,i+m, for i = 1,...,A^, 

and treat these quantities as N realizations of the stochastic variables X and x. 
The minimizer of the risk functional 

R[f]= j dXdx{x- f{X)fp{X,x) (1) 

represents the best estimate of x, given X, and corresponds [21* to the regression 
function f*{X) = J dxp{x\X)x. Now, let {rin}n=i,.,N+m be another time series 
of simultaneously acquired quantities, and denote Yi = {rji, . . . ,rii^rn~i)^ ■ The 
best estimate of x, given X and Y, is now: g*{X,Y) = J dxp{x\X,Y)x. If the 
generalized Markov property holds, i.e. 

p{x\X,Y)=p{x\X), (2) 

then f*{X) — g*{X,Y) and the knowledge of Y does not improve the predic- 
tion of X. Transfer entropy [53] is a measure of the violation of [5] it follows 
that Granger causality implies non-zero transfer entropy (20j . Under Gaussian 
assumption it can be shown that Granger causality and transfer entropy are 
entirely equivalent, and just differ for a factor two f^. The generalization of 
Granger causality to a multivariate fashion, described in the following, allows 
the analysis of dynamical networks [19 and to discern between direct and indi- 
rect interactions. 

Let us consider n time series {a;a(t)}Q=i, the state vectors are denoted 

Xait) = iXa{t-m),...,Xa{t- 1)), 

m being the window length (the choice of m can be done using the standard 
cross-validation scheme). Let e (a;Q.|X) be the mean squared error prediction of 
Xa on the basis of all the vectors X (corresponding to linear regression or non 
linear regression by the kernel approach described in |20|). The multivariate 
Granger causality index c{f3 — )■ a) is defined as follows: consider the prediction 
of Xa on the basis of all the variables but Xp and the prediction of Xa using all 
the variables, then the causality measures the variation of the error in the two 
conditions, i.e. 

Note that in '20' a different definition of causality has been used, 

^(/3^a)^ -(-"'^\f4\-;(f"'^) ; (4) 
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The two definitions are clearly related by a monotonic transformation: 



c(/3 ^ a) = - log [l-S{/3^ a)] . (5) 

Here we first evaluate the causality S{(3 a) using the selection of significative 
eigenvalues described in [TH] to address the problem of over-fitting in (U); then 
we use ([5]) and express our results in terms of c(/3 — J> a), because it is with this 
definition that causality is twice the transfer entropy, equal to /{xq; Xp\'K\Xp}^ 
in the Gaussian case ^ . 

Turning now to the central point of this paper, we address the problem 
of coping with a large number of variables, when the application of multivari- 
ate Granger causality may be questionable or even unfeasible, whilst bivariate 
causality would detect also indirect causalities. Here we show that conditioning 
on a small number of variables, chosen as the most informative for the candidate 
driver variable, is sufficient to remove indirect interactions for sparse connec- 
tivity patterns. Conditioning on a large number of variables requires an high 
number of samples in order to get reliable results. Reducing the number of 
variables, that one has to condition over, would thus provide better results for 
small data-sets. In the general formulation of Granger causality, one has no way 
to choose this reduced set of variables; on the other hand, in the framework of 
information theory, it is possible to individuate the most informative variables 
one by one. Once that it has been demonstrated [3 that Granger causality 
is equivalent to the information flow between Gaussian variables, partial con- 
ditioning becomes possible for Granger causality estimation; to our knowledge 
this is the first time that such approach is proposed. 

Concretely, let us consider the causality /3 — > a; we fix the number of vari- 
ables, to be used for conditioning, equal to nd- We denote Z = (^Xi-^ , ■ ■ ■ , 
the set of the Ud variables, in X \ Xp, most informative for Xp. In other words, 
Z maximizes the mutual information I{Xj3] Z} among all the subsets Z of Ud 
variables. Then, we evaluate the causality 

c(/3^a)^log '%"l^ly (6) 

Under the Gaussian assumption, the mutual information I{Xp; Z} can be eas- 
ily evaluated, see jS,. Moreover, instead of searching among all the subsets of 
Ud variables, we adopt the following approximate strategy. Firstly the mutual 
information of the driver variable, and each of the other variables, is estimated, 
in order to choose the first variable of the subset. The second variable of the 
subsets is selected among the remaining ones, as those that, jointly with the 
previously chosen variable, maximizes the mutual information with the driver 
variable. Then, one keeps adding the rest of the variables by iterating this 
procedure. Calling Zfe_i the selected set of A; — 1 variables, the set Z^, is ob- 
tained adding , to Z^-i, the variable, among the remaining ones, with greatest 
information gain. This is repeated until Ud variables are selected. This greedy 
algorithm, for the selection of relevant variables, is expected to give good results 
under the assumption of sparseness of the connectivity. 
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Results and Discussion 



Simulated data 

Let us consider linear dynamical systems on a lattice of n nodes, with equations, 
for i — 1^ . . . ,n: 



where a's are the couplings, s is the strength of the noise and r's are unit vari- 
ance i.i.d. Gaussian noise terms. The level of noise determines the minimal 
amount of samples needed to assess that the structures recovered by the pro- 
posed approachare genuine and are not due to randomness, as it happens for the 
standard Granger causality (see discussions in [5D] and [12]); in particular noise 
should not be too high to obscure deterministic effects. Firstly we consider a 
directed tree of 16 nodes depicted in figure ([T]); we set a^j equal to 0.9 for each 
directed link of the graph thus obtained, and zero otherwise. We set s — 0.1. 
In figure we show the application of the proposed methodology to data sets 
generated by eqs. (O, 100 samples long, in terms of quality of the retrieved 
network, expressed in terms of sensitivity (the percentage of existing links that 
are detected) and specificity (the percentage of missing links that are correctly 
recognized as non existing). The bivariate analysis provides 100% sensitivity 
and 92% specificity. However conditioning on a few variables is sufficient to put 
in evidence just the direct causalities while still obtaining high values of sensi- 
tivity. The full multivariate analysis (obtained as Ud tends to 16) gives here a 
rather low sensitivity, due to the low number of samples. This is a clear example 
where conditioning on a small number of variables is better than conditioning 
on all the variables at hand. 

As another example, we now fix n = 34 and construct couplings in terms of 
the well known Zachary data set [57], an undirected network of 34 nodes. We 
assign a direction to each link, with equal probability, and set a^j equal to 0.015, 
for each link of the directed graph thus obtained, and zero otherwise. The noise 
level is set s = 0.5. The network is displayed in figure ([3]): the goal is again to 
estimate this directed network from the measurements of time series on nodes. 

In figure (Hj) we show the application of the proposed methodology to data 
sets generated by eqs. ([7|), in terms of sensitivity and specificity, for different 
numbers of samples. The bivariate analysis detects several false interactions, 
however conditioning on a few variables is sufficient to put in evidence just the 
direct causalities. Due to the sparseness of the underlying graph, we get a result 
which is very close to the one by the full multivariate analysis; the multivariate 
analysis here recovers the true network, indeed the number of samples is suffi- 
ciently high. In figure ([5]) , concerning the stage of selection of variables upon 
which conditioning, we plot the mutual information gain as a function of the 
number of variables included rid'- it decreases as Ud increases. 



n 




(7) 



i=i 
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Figure 1: A directed rooted tree of 16 nodes. 
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Figure 2: The sensitivity (top) and the specificity (bottom) arc plotted versus 
rid, the number of variables selected for conditioning, for the first example, the 
rooted tree. The number of samples N is 100 and the order is rn = 2; similar 
results are obtained varying m. The results are averaged over 100 realizations 
of the linear dynamical system described in the text. The empty square, in cor- 
respondence to rid = 0, is the result from the bivariate analysis. The horizontal 
line is the outcome from multivariate analysis, where all variables are used for 
conditioning. 




Figure 3: The directed network of 34 nodes obtained assigning randomly a 
direction to links of the Zachary network. 
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Figure 4: Sensitivity and specificity are plotted versus na, the number of vari- 
ables selected for conditioning, for two values of two values of the number of 
samples N, 500 (left) and 1000 (right). The order is m = 2, similar results 
are obtained varying m. The results are averaged over 100 realizations of the 
linear dynamical system described in the text. The empty square, in corre- 
spondence to rifi = 0, is the result from the bivariate analysis. The horizontal 
line is the outcome from multivariate analysis, where all variables are used for 
conditioning. 
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Figure 5: The mutual information gain, when the (nd+l)-th variable is included, 
is plotted versus Ud for two values of the of the number of samples N, 500 (top) 
and 1000 (bottom). The order is m = 2. The information gain is averaged over 
all the variables. 



9 



EEG epilepsy data 

We consider now a real data set from an 8 x 8 electrode grid that was implanted in 
the cortical surface of the brain of a patient with epilepsy [T7] . We consider two 
10-seconds intervals prior to and immediately after the onset of a seizure, called 
respectively the preictal period and the ictal period. In figure ^ we show the 
application of our approach to the preictal period; we used the linear causality. 
The bivariate approach detects many causalities between the electrodes; most of 
them, however, are indirect. According to the multivariate analysis there is just 
one electrode which is observed to influence the others, even in the multivariate 
analysis: this electrode corresponds to a localized source of information and 
could indicate a putative epileptic focus. In (|6]) it is shown that conditioning 
on = 5 or rid = 20 variables provides the same pattern corresponding to the 
multivariate analysis, which thus appears to be robust. These results suggest 
that the effective connectivity is sparse in the preictal period. As a further 
confirmation, in ([T]) we plot the sum of all causalities detected as a function 
of the number of conditioning variables, for the preictal period; a plateau is 
reached already for small values of n^. 

In ([5]) the same analysis is shown w.r.t. the ictal period: in this case condi- 
tioning on nrf = 5 or = 20 variables does not reproduce the pattern obtained 
with the multivariate approach. The lack of robustness of the causality pattern 
w.r.t. rirf seems to suggest that the effective connectivity pattern, during the 
crisis, is not sparse. In ([9]) and ((T0| we show, for each electrode and for the 
preictal and ictal periods respectively, the total outgoing causality (obtained as 
the sum of the causalities on all the other variables). These pictures confirm 
the discussion above: looking at how the causality changes with Ud may provide 
information about the sparseness of the effective connectivity. 
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Figure 6: The causality analysis of the preictal period. The causality c{i — > j) 
corresponds to the row i and the column j. The order is chosen m = 6 according 
to the AIC criterion. Top left: bivariate analysis. Top right: our approach 
with Hd = 5 conditioning variables. Bottom left: our approach with Ud = 20 
conditioning variables. Bottom right: the multivariate analysis. 
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Figure 7: Concerning the preictal period, the sum of all causalities is plotted 
versus the number of conditioning variables 



12 



ict bivariate ict nd=5 




20 40 60 20 40 60 



Figure 8: The sum of outgoing causality from each electrode in the EEG ap- 
plication, ictal period. Top left: bivariate analysis. Top right: our approach 
with Hd — 5 conditioning variables. Bottom left: our approach with nd = 20 
conditioning variables. Bottom right: the multivariate analysis. 
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Figure 9: The sum of outgoing causality from each electrode in the EEG appli- 
cation, preictal period. Top left: bivariate analysis. Top right: our approach 
with rid = 5 conditioning variables. Bottom left: our approach with = 20 
conditioning variables. Bottom right: the multivariate analysis. 
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Figure 10: The causality analysis of the ictal period. The causality c{i — > j) 
corresponds to the row i and the column j. The order is chosen m = 6 according 
to the AIC criterion. Top left: bivariate analysis. Top right: our approach 
with rid = 5 conditioning variables. Bottom left: our approach with na = 20 
conditioning variables. Bottom right: the multivariate analysis. 



Conclusions 

Wc have addressed the problem of partial conditioning to a limited subset of 
variables while estimating causal connectivity, as an alternative to full condi- 
tioning, which can lead to computational and numerical problems. Analyzing 
simulated examples and a real data-set, wc have shown that conditioning on a 
small number of variables, chosen as the most informative ones for the driver 
node, leads to results very close to those obtained with a fully multivariate anal- 
ysis, and even better in the presence of a small number of samples, especially 
when the pattern of causalities is sparse. Moreover, looking at how causality 
changes with the number of conditioning variables provides information about 
the sparseness of the connectivity. 
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